What is this document ?

The best way to get a quick introduction to the package ‘spatialwarnings’ is through our freely-available paper, which describes the scientific background of the indicators and how they can be computed with the package. The purpose of this document is to answer more advanced questions and/or issues that came up and are related to the computation of indicators.

This document assumes some familiarity with the basic functions of the package and with R code in general.

How can I deal with NAs in the data ?

Put simply, there is no proper way to deal with missing values, this is why it’s up to the user of the package to deal with those before computing indicators. It depends on the meaning of those NAs.

If the matrix is has continuous values, then NAs generally represent missing measurements. In this case, interpolation may be an option, but computing indicators on interpolated data may be very dangerous (because it increases autocorrelation in the data, which is by itself an indicator !), see e.g. this blog post.

If the matrix has discrete (TRUE/FALSE) values, then NAs may represent either (i) missing measurements, and in this case you are in the same boat as above or (ii) a pixel that has been observed, but where the value represents the non-focal state (i.e. what should be in fact, a FALSE value). In this case, it should be safe to replace the NAs in the matrix by FALSE values.

For example, a matrix could contain TRUE values where the pixel contains a given cover of forest and NA if it contains any other thing (buildings, grassland, etc.). If you are interested in the spatial dynamics of forest patches, then it should be safe to replace those NAs by FALSE values and compute indicators.

How can I add another indicator or test new ones ?

‘spatialwarnings’ provides the function create_indicator, which allows the user to define new, arbitrary indicators, and use them as if they were *_sews functions (and thus benefit from significance testing, plotting functions, etc.).

The process is in two steps, we first define in R a function that computes our metric. This function should take a matrix as input, and return a single, numerical value. Here, we define a function that returns an approximation of the Kolmogorov complexity of a given matrix. This indicator has been suggested by Dakos et al. (2017) (references are at the bottom of this document).


# This function takes a matrix and a returns a single value. 
get_kbdm <- function(mat) { 
  require(acss)
  require(magrittr)
  
  # Split matrix 
  subsize <- 3
  xs <- seq(1, nrow(mat), by = subsize)
  ys <- seq(1, ncol(mat), by = subsize)
  allblockns <- expand.grid(seq.int(length(xs)-1), 
                            seq.int(length(ys)-1))
  all_substr <- Map(function(xblockn, yblockn) { 
                         as.vector(mat[xs[xblockn]:(xs[xblockn+1]-1), 
                                       ys[yblockn]:(ys[yblockn+1]-1)]) %>% 
                          as.integer() %>% 
                          paste(collapse = "")
                         }, 
                    allblockns[ ,1], allblockns[ ,2]) %>% unlist
  # Summarize the substrings 
  counts <- table(all_substr)
  counts <- data.frame(string = names(counts), 
                       multip = as.vector(counts), 
                       kctm = acss(names(counts), alphabet = 2)[ ,1])
  
  # Compute Kbdm 
  return( with(counts, sum(log2(multip) + kctm)) )
}

We then create an “indicator function” using the first function. This indicator function can then be used as any other function from the *_sews family. You just define how the metric is derived from data, ‘spatialwarnings’ handles all the significance-testing and plotting automatically.


# Create the indicator function
kbdm_sews <- create_indicator(get_kbdm)

# Apply this function on the serengeti dataset. We reduce the number of 
# permutations when testing significance because this indicator takes quite 
# some time to be computed. 
kbdm_indic <- kbdm_sews(serengeti)
kbdm_indic <- indictest(kbdm_indic, nperm = 49)

# Display the trends 
plot(kbdm_indic, along = serengeti.rain)


# Display text-based summary
summary(kbdm_indic)
#> Custom Spatial Early-Warnings: get_kbdm 
#> 
#>  Mat. # get_kbdm P>null    
#>       1     1559   0.00 ***
#>       2     1863   0.00 ***
#>       3     2227   0.00 ***
#>       4     2834   0.00 ***
#>       5     3012   0.00 ***
#>       6     3055   0.00 ***
#>       7     2790   0.00 ***
#>       8     3189   0.30    
#>       9     3307   0.46    
#>      10     3458   0.72    
#>      11     4148   0.98    
#>      12     5001   0.98    
#>      13     5692   0.98    
#> 
#>  Significance tested against 49 randomly shuffled matrices
#>  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Can I use Raster{Layer,Stack,Brick} objects with ‘spatialwarnings’ ?

Raster* objects are used by the package \raster to encapsulate raster data, for example data read from Geotiff images. These objects can contain a single band (RasterLayer objects) or multiple bands (RasterStack/RasterBrick objects). Each band of a raster objects is, by definition, a matrix of values.

Right now ‘spatialwarnings’ can not use Raster objects directly, but each individual band of \raster objects can be extracted as a matrix. These matrices can be then used in ‘spatialwarnings’. An example is provided below:

# Load packages
library(raster)

# It makes no sense computing indicators on those images, they just are
# example geotiffs
example_files <- c("https://download.osgeo.org/geotiff/samples/spot/chicago/SP27GTIF.TIF", 
  "https://download.osgeo.org/geotiff/samples/spot/chicago/UTM2GTIF.TIF")

# Read the list of files using functions from the package 'raster'
list_of_raster_objects <- lapply(example_files, function(f) {
  tmpfile <- tempfile()
  download.file(f, destfile = tmpfile)
  raster(tmpfile)
})

# Convert those to matrices
list_of_matrices_objects <- lapply(list_of_raster_objects, as.matrix)

# ...and use them in spatialwarnings
generic_sews(list_of_matrices_objects)
#> Warning in FUN(X[[i]], ...): Input matrix has continous values but will be
#> coarse-grained anyway. Set subsize=1 to disable coarse graining.

#> Warning in FUN(X[[i]], ...): Input matrix has continous values but will be
#> coarse-grained anyway. Set subsize=1 to disable coarse graining.
#> Generic Spatial Early-Warnings
#> 
#>  2 matrices (size: 929x699)
#> 
#>  Mat. # Mean Moran's I Skewness Variance
#>       1  115      0.64     0.68     1177
#>       2  115      0.63     0.68     1175
#> 
#> Use as.data.frame() to retrieve values in a convenient form

Of course, right now some metadata of the raster objects is lost in the process (extent, resolution, etc.). This information could be used by ‘spatialwarnings’ in a future release to adjust output (for example, adjust distance units when displaying the r-spectrum).

Multi-band rasters (RasterStack/RasterBrick objects) can also be converted to arrays (by using as.array). These arrays can then be summarized and/or classified (see below) to get a matrix that can be used in ‘spatialwarnings’.

How can I identify the periodicity in data ?

EWS available in assume that stress does not affect the periodicity of the spatial structure, as changes in periodicity may mask mask of alter trends in indicator values. Before computing the EWS, it is therefore needed to check for any possible periodicity in the input data. In this section, we contrast two images to show how the r-spectrum can be used to highlight periodicity (or its absence).


# Load packages
library(png)     # image-reading package
library(ggplot2) # plotting 
library(spatialwarnings)

# Read images
image_periodic <- readPNG('./patterned_bush_niger.png')
image_nonper   <- readPNG('./non_patterned_spain.png')

# The images have three bands (R/G/B). We transform them into a black 
# and white image using principal component analysis (PCA)
summarise_pca <- function(arr) { 
  values <- matrix(as.vector(arr), ncol = 3)
  pca <- prcomp(values)
  values <- pca[["x"]][ ,1]
  matrix(values, ncol = ncol(arr), nrow = nrow(arr)) 
}

# Transform into one-band (black and white image) and compute r-spectrum. 
image_periodic_pca <- summarise_pca(image_periodic)
rspec_periodic <- rspectrum(image_periodic_pca)

image_nonper_pca <- - summarise_pca(image_nonper)
rspec_nonper <- rspectrum(image_nonper_pca)

The periodic image below (tiger bush in Niger 1) displays a typical banded regular pattern (a). Its scale (in number of pixels) is reflected by a hump in the r-spectrum around a size of ~ 15px (b). This corresponds to the width of one vegetation band in the image (~30m in reality).


img <- ggplot(mat2longdf(image_periodic_pca)) + 
        geom_raster(aes(x = x, y = y, fill = value)) + 
        scale_fill_gradient(low = "black", high = "white") + 
        scale_y_reverse() + 
        theme_minimal() + 
        theme(legend.position = "none") + 
        labs(x = "Pixel coordinate", 
            y = "Pixel coordinate", 
            title = paste0(lettercount(), ") Image of periodic vegetation"))

rsp <- ggplot(rspec_periodic) + 
         geom_point(aes(x = dist, y = rspec), pch = 20) + 
         scale_x + scale_y + 
         theme_minimal() + 
         labs(x = "Distance (number of pixels)", 
             y = "r-spectrum value (power)", 
             title = paste0(lettercount(), ") r-spectrum of periodic image"))

gridExtra::grid.arrange(img, rsp, ncol = 2)

This image displays strong periodicity and the indicators in should be used with care to draw conclusions from it.

In contrast, an aperiodic image displays no such hump in its r-spectrum. The image below (c, arid grassland in Spain) is aperiodic, as shown by the decreasing r-spectrum (d). Indicators available in may (in principle) be used on this image.


img <- ggplot(mat2longdf(image_nonper_pca)) + 
        geom_raster(aes(x = x, y = y, fill = value)) + 
        scale_fill_gradient(low = "black", high = "white") + 
        scale_y_reverse() + 
        theme_minimal() + 
        theme(legend.position = "none") + 
        labs(x = "Pixel coordinate", 
            y = "Pixel coordinate", 
            title = paste0(lettercount(), ") Image of non-periodic vegetation"))

rsp <- ggplot(rspec_nonper) + 
         geom_point(aes(x = dist, y = rspec), pch = 20) + 
         scale_x + scale_y + 
         theme_minimal() + 
         labs(x = "Distance (number of pixels)", 
             y = "r-spectrum value (power)", 
             title = paste0(lettercount(), ") r-spectrum of non-periodic image"))

gridExtra::grid.arrange(img, rsp, ncol = 2)

Converting multi-band images

Raster images from remote sensing often come with multiple bands, because of the way images are stored digitally (e.g. red/blue/green bands), but also because multiple sensors can be used to capture a single image (e.g. a combination of visible/near-infrared sensors). These images are often represented as arrays in R (matrices that are extended with a third dimension, “depth”).

As spatialwarnings computes indicators on matrix objects, these arrays need to be first transformed into matrices (effectively arrays with no depth). This entails summarizing the multi-dimensional data of each pixel to a single value. In addition, some indicators (e.g. patch-based indicators) rely on classified, TRUE/FALSE data that defines which pixels belong to a patch and which do not. Again, the classification algorithm employed is likely to be very system-dependent, but we provide a possible example below.

In what follows, we use a case-study to show how to carry these two processes and apply the spatialwarnings package on images. Using an aerial image stored as an R/G/B PNG image, we classify each pixel as vegetated/non-vegetated or summarize the multi-channel pixel values to a single value. We then apply the indicators on the resulting transformed matrix.

Note that in many cases, other methods can be used to derive univariate indices from remote sensing data that may be more appropriate to describe the focal system. For example, a typical one-dimensional index for vegetation data is NDVI (Normalized Difference Vegetation Index), which measures the extent of green vegetation in a given pixel and is obtained as the normalized difference between the red a near-infrared bands of a multi-band image.

Classifying an RGB image to compute patch-based indicators

We first load the required packages and read the image in R.

library(png) # Read PNG images as arrays
library(plyr) # Manipulate data.frames
library(tidyr) # Manipulate data.frames
library(ggplot2) # Make plots
library(spatialwarnings) 

# Read image
datadir <- "../figs/images/"
img1 <- readPNG('./crau_quercus_encroachment.png')

# Display image 
grid::grid.raster(img1)

The vegetation in this image displays some patchiness, that can be characterized by spatialwarnings. img1 is a raster image with three (red/blue/green) channels that need to be be classified into TRUE (pixel is in a vegetation patch) or FALSE values in order to compute patch-based compute indicators.

# Convert the image to a data.frame 
img1_tab <- data.frame(expand.grid(x = seq.int(nrow(img1)), 
                                   y = seq.int(ncol(img1))), 
           as.data.frame(matrix(img1, ncol = 3)))
names(img1_tab) <- c('x', 'y', 'red', 'green', 'blue')

# A very bright object is present in the image, that distorts the results 
# further down and needs to be removed. We set its pixel values to NAs. 
luminance <- with(img1_tab, sqrt( 0.299*red^2 + 0.587*green^2 + 0.114*blue^2 ))
img1_tab[ ,c('red', 'green', 'blue')] <- 
  img1_tab[ ,c('red', 'green', 'blue')] * ifelse(luminance > .5, NA, 1)

We can display the distribution of each channel values:

# fix level order so colors match in graph
df <- gather(img1_tab, channel, value, red, green, blue)
df[ ,'channel'] <- factor(df[ ,'channel'], levels = c("red", "green", "blue"), 
                          ordered = TRUE) 
ggplot( df ) + 
  geom_density(aes(x = value, color = channel)) +
  theme_minimal() +
  labs(caption = "Distribution of channel values") + 
  scale_color_manual(values = c('red', 'green', 'blue', 'black')) 
#> Warning: Removed 564 rows containing non-finite values (stat_density).

Each individual channel can be also represented as a monochrome image:

ggplot(gather(img1_tab, channel, value, red, green, blue)) + 
  geom_raster(aes(x = x, y = y, fill = value)) + 
  facet_grid( ~ channel, labeller = label_both) + 
  coord_fixed() + 
  theme_minimal() + 
  scale_fill_gradient(low = "#000000", high = "#FFFFFF") + 
  labs(caption = "Monochrome representation of the three channels of the RGB image")

We can classify this image using an unsupervised k-means classification algorithm on the pixel data. Many other classification algorithms exist and may show better accuracy, but k-means is simple, generic and fast.


km <- kmeans(na.omit(img1_tab[ ,c('red', 'green', 'blue')]), 
             centers = 2)
img1_tab[ ,'clust'] <- NA
img1_tab[!is.na(img1_tab[ ,'red']), 'clust'] <- km[['cluster']]

# The number identifying each cluster is random in k-means: we make sure that 
# cluster 2 always has greener values, and thus always corresponds to potential 
# vegetation patches. 
img1_tab[ ,'clust'] <- with(img1_tab, as.integer(reorder(as.factor(clust), -green)))

ggplot(img1_tab) + 
  geom_raster(aes(x = x, y = y, 
                  fill = as.factor(clust))) + 
  coord_fixed() + 
  theme_minimal() + 
  scale_fill_manual(values = c('#F4EAA4', '#0A8E0B'))
#> Warning: Removed 188 rows containing missing values (geom_raster).

Cluster 2 roughly identifies vegetation patches, but many single-pixel values are misclassified by the k-means algorithm. However, they most likely pertain to the nearest patch: we apply a smoothing filter to get rid of them.


clust_filt <- with(img1_tab, 
                   gblur(matrix(!is.na(clust) & clust == 2, 
                                nrow = max(x), ncol = max(y)), 
                                sigma = 1.5)) > .2

img1_tab[ ,'clust_filt'] <- as.vector(clust_filt)

ggplot(img1_tab) + 
  geom_raster(aes(x = x, y = y, 
                  fill = as.factor(clust_filt))) + 
  coord_fixed() + 
  theme_minimal() + 
  scale_fill_manual(name = "Vegetation", 
                    values = c('#F4EAA4', '#0A8E0B'))

We can then apply the patch-based indicators on the resulting classified matrix.


psd_indic <- patchdistr_sews(clust_filt, fit_lnorm = TRUE)

summary(psd_indic)
#> Patch-based Early-Warnings results
#> 
#>  N(uniq.) Cover Percl.Full Percl.Empt  Type PLR
#>  944(254)  0.52      FALSE       TRUE lnorm 57%
#> 
#> Use as.data.frame() to retrieve values in a convenient form

plot_distr(psd_indic, best_only = FALSE) + 
  labs(title = "Results of patch-size distribution fitting")

Note that this brief demonstration uses k-means, but many classification methods exist, either supervised or unsupervised (Liu and Mason, 2016) and may show better results depending on the focal system.

Reducing the dimensionality of pixel data

Indicators require each pixel to have a single, uni-dimensional value. One option is to use the values of each channel independently (e.g. using red values only), but these may not represent the state of the system most accurately. Ideally, each pixel should carry a value that defines as accurately the state of the system.

Similar to classifications, supervised and unsupervised methods exist to summarize three-dimensional information to a single value. A simple generic unsupervised method is Principal Component Analysis (see e.g. Liu and Mason, 2016 for more details about its application on raster images), which we use here as example.

We carry out a PCA on the pixel data:


# We use the red/green/blue information and combine them using PCA
NApixel <- is.na(img1_tab[ ,'red'])
pca <- prcomp(img1_tab[!NApixel, c('red', 'green', 'blue')])

summary(pca)
#> Importance of components:
#>                           PC1     PC2     PC3
#> Standard deviation     0.1704 0.01893 0.01193
#> Proportion of Variance 0.9831 0.01212 0.00482
#> Cumulative Proportion  0.9831 0.99518 1.00000

# Import data back into data.frame
img1_tab[ ,'pca1'] <- NA
img1_tab[!NApixel,'pca1'] <- predict(pca)[ ,1]

ggplot(img1_tab) + 
  geom_raster(aes(x = x, y = y, fill = pca1)) + 
  coord_fixed() + 
  theme_minimal()

Results from PCA show that the first axis explains 98% of the variance in RGB value, so it summarizes very well the variations in pixel values. We can then apply indicators on these univariate values (we use here the example of spectral indicators).

Note that because there are some NA values (0.03% of all pixels) in the input image (the pixels where a very bright object is present), we need to interpolate them before computing the indicators. Note that interpolation increases the autocorrelation in a given matrix and indicators should be interpreted with caution on matrices where a large amount of pixels have been interpolated.


# Convert to matrix format then compute indicators. Because there are NA 
# values in the input image, we need to interpolate these values. We fill the 
# pixels with NAs with the mean value of their direct neighbors. 
pca_mat <- with(img1_tab, matrix(pca1, ncol = max(y), nrow = max(x)))
while ( any( is.na(pca_mat) ) ) { 
  for (i in seq.int(nrow(pca_mat))) { 
    if ( any(is.na(pca_mat[i, ])) ) { 
      for (j in seq.int(ncol(pca_mat))) { 
        
        # Bound check on the matrix
        i2 <- min(i, 1, nrow(pca_mat))
        j2 <- min(j, 1, ncol(pca_mat))
        if ( is.na(pca_mat[i, j]) ) { 
          pca_mat[i, j] <- mean(pca_mat[(i2-1):(i2+1), 
                                        (j2-1):(j2+1)])
        }
      }
    }
  }
} 

We can then compute the indicators on the resulting matrix:


# Compute spectral indicators 
ic <- spectral_sews(pca_mat, 
                    sdr_low_range = c(0, .2), 
                    sdr_high_range = c(.8, 1))
#> Warning in spectral_sews(pca_mat, sdr_low_range = c(0, 0.2), sdr_high_range
#> = c(0.8, : The matrix is not square: only a squared subset in the left part
#> will be used.
ic <- indictest(ic, nperm = 499)

# Display textual summary 
summary(ic)
#> Spectral Spatial Early-Warnings results
#> 
#>  Matrix # SDR Value P>null    
#>         1  37.72273 <2e-03 ***
#> 
#>  Significance tested against 499 randomly shuffled matrices
#>  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that while PCA may summarize well arbitrary variations in R/G/B pixel values, this does not mean that the resulting pixel values always capture well the variations in the system state. Before interpreting indicators, it is necessary to understand well how the numerical values in the image reflect the state of the ecological system. The use of indices known to reflect a specific aspect of the system (e.g. NDVI reflecting the amount of green vegetation cover) may be often more informative than using a generic algorithm to collapse multiple channels into one.

References

Dakos, V., and F. Soler-Toscano. 2017. Measuring complexity to infer changes in the dynamics of ecological systems under stress. Ecological Complexity 32:144–155.

Liu and Mason, 2016. Image Processing and GIS for Remote Sensing: Techniques and Applications. John Wiley & Sons, Ltd.